Bayesian 4

SNR 610

Model: Wallcreeper

Birds that have declined in the last years
We will look at (simulate) data from 1989 to 2005
We sample random quadrats each year and record (proportion of quadrats)
We will use a normal linear model for this

Simulate our population

\[ y_i = \beta_0 + \beta_1 year_i + \epsilon_i \]

\[ y_i = \alpha + \beta \times year_i + \epsilon_i \]

n = 16 #(years)
a = 40 #(intercept)
b = -0.7 #(slope)
sigma = 4
sigma2 = sigma^2
x = 1:16

y is proportion of quadrats

Save true values in an object

truth <- c(alpha = a, beta = b, sigma = sigma)

Simulation

y<- a+b*x + rnorm(n,0,sigma)

Plot the data

CHANGE THE X AXIS!

library(ggplot2)

df<-data.frame(x=x,y=y)
ggplot(df,aes(x=x+1989,y=y))+
  geom_point(size=2.5)+
  ylab("Prop. occupied (%)")+
  xlab("Year")+theme_classic()

Run the model in frequentist

model <- lm(y ~ x)
summary(model)


Call:
lm(formula = y ~ x)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.7822 -1.6253 -0.2616  1.5073  6.8714 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  38.0633     1.6606  22.922 1.68e-12 ***
x            -0.6367     0.1717  -3.707  0.00234 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.167 on 14 degrees of freedom
Multiple R-squared:  0.4954,    Adjusted R-squared:  0.4593 
F-statistic: 13.74 on 1 and 14 DF,  p-value: 0.002344

lm_est <- c(coef(model), sigma = sigma(model)) 
 lm_est

(Intercept)           x       sigma 
 38.0633267  -0.6366534   3.1666135

Run the stan model

Download the stan package

library(rstan)

Let’s run the model

First step: Bundle the data:

Use a list to put response variable, explanatory variable and n(number of observations)

dataList <- list(y = y, x = x, n = n)

Step 2 (do not run)

Let’s break this up

# Write text file with model description in Stan language
cat(file = "model.stan", # This line is R code
"data { // This is the first line of Stan code
  int <lower = 0> n; // Define the format of all data
  vector[n] y; //. . . including the dimension of vectors
  vector[n] x; //
}
parameters { // Define format for all parameters
  real alpha;
  real beta;
  real <lower = 0> sigma; // sigma (sd) cannot be negative
}
transformed parameters {
  vector[n] mu;
  for (i in 1:n){
    mu[i] = alpha + beta * x[i]; // Calculate linear predictor
  }
}
model {
  // Priors
  alpha ~ normal(0, 100);
  beta ~ normal(0, 100);
  sigma ~ cauchy(0, 10);
  // Likelihood (could be vectorized to increase speed)
  for (i in 1:n){
    y[i] ~ normal(mu[i], sigma);
  }
}

" )

Data and parameters

cat(file = "model.stan", # This line is R code
"data { // This is the first line of Stan code
  int <lower = 0> n; // Define the format of all data
  vector[n] y; //. . . including the dimension of vectors
  vector[n] x; //
}

parameters { // Define format for all parameters
  real alpha;
  real beta;
  real <lower = 0> sigma; // sigma (sd) cannot be negative
}

Transformed parameter

transformed parameters {
  vector[n] mu;
  for (i in 1:n){
    mu[i] = alpha + beta * x[i]; // Calculate linear predictor
  }

Model

model {
  // Priors
  alpha ~ normal(0, 100);
  beta ~ normal(0, 100);
  sigma ~ cauchy(0, 10);
  // Likelihood
  for (i in 1:n){
    y[i] ~ normal(mu[i], sigma);
  }
}

All code

# Write text file with model description in Stan language
cat(file = "model.stan", # This line is R code
"data { // This is the first line of Stan code
  int <lower = 0> n; // Define the format of all data
  vector[n] y; //. . . including the dimension of vectors
  vector[n] x; //
}
parameters { // Define format for all parameters
  real alpha;
  real beta;
  real <lower = 0> sigma; // sigma (sd) cannot be negative
}
transformed parameters {
  vector[n] mu;
  for (i in 1:n){
    mu[i] = alpha + beta * x[i]; // Calculate linear predictor
  }
}
model {
  // Priors
  alpha ~ normal(0, 100);
  beta ~ normal(0, 100);
  sigma ~ cauchy(0, 10);
  // Likelihood (could be vectorized to increase speed)
  for (i in 1:n){
    y[i] ~ normal(mu[i], sigma);
  }
}

" )

Next step

Set settings and run it

# HMC settings
ni <- 3000 ; nb <- 1000 ; nc <- 4 ; nt <- 1
# Call STAN 

modelstan <- stan(file = "model.stan", data = dataList,
chains = nc, iter = ni, warmup = nb, thin = nt)


SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 3e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.3 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 3000 [  0%]  (Warmup)
Chain 1: Iteration:  300 / 3000 [ 10%]  (Warmup)
Chain 1: Iteration:  600 / 3000 [ 20%]  (Warmup)
Chain 1: Iteration:  900 / 3000 [ 30%]  (Warmup)
Chain 1: Iteration: 1001 / 3000 [ 33%]  (Sampling)
Chain 1: Iteration: 1300 / 3000 [ 43%]  (Sampling)
Chain 1: Iteration: 1600 / 3000 [ 53%]  (Sampling)
Chain 1: Iteration: 1900 / 3000 [ 63%]  (Sampling)
Chain 1: Iteration: 2200 / 3000 [ 73%]  (Sampling)
Chain 1: Iteration: 2500 / 3000 [ 83%]  (Sampling)
Chain 1: Iteration: 2800 / 3000 [ 93%]  (Sampling)
Chain 1: Iteration: 3000 / 3000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.053 seconds (Warm-up)
Chain 1:                0.089 seconds (Sampling)
Chain 1:                0.142 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 5e-06 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.05 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 3000 [  0%]  (Warmup)
Chain 2: Iteration:  300 / 3000 [ 10%]  (Warmup)
Chain 2: Iteration:  600 / 3000 [ 20%]  (Warmup)
Chain 2: Iteration:  900 / 3000 [ 30%]  (Warmup)
Chain 2: Iteration: 1001 / 3000 [ 33%]  (Sampling)
Chain 2: Iteration: 1300 / 3000 [ 43%]  (Sampling)
Chain 2: Iteration: 1600 / 3000 [ 53%]  (Sampling)
Chain 2: Iteration: 1900 / 3000 [ 63%]  (Sampling)
Chain 2: Iteration: 2200 / 3000 [ 73%]  (Sampling)
Chain 2: Iteration: 2500 / 3000 [ 83%]  (Sampling)
Chain 2: Iteration: 2800 / 3000 [ 93%]  (Sampling)
Chain 2: Iteration: 3000 / 3000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.053 seconds (Warm-up)
Chain 2:                0.092 seconds (Sampling)
Chain 2:                0.145 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 6e-06 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.06 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 3000 [  0%]  (Warmup)
Chain 3: Iteration:  300 / 3000 [ 10%]  (Warmup)
Chain 3: Iteration:  600 / 3000 [ 20%]  (Warmup)
Chain 3: Iteration:  900 / 3000 [ 30%]  (Warmup)
Chain 3: Iteration: 1001 / 3000 [ 33%]  (Sampling)
Chain 3: Iteration: 1300 / 3000 [ 43%]  (Sampling)
Chain 3: Iteration: 1600 / 3000 [ 53%]  (Sampling)
Chain 3: Iteration: 1900 / 3000 [ 63%]  (Sampling)
Chain 3: Iteration: 2200 / 3000 [ 73%]  (Sampling)
Chain 3: Iteration: 2500 / 3000 [ 83%]  (Sampling)
Chain 3: Iteration: 2800 / 3000 [ 93%]  (Sampling)
Chain 3: Iteration: 3000 / 3000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.048 seconds (Warm-up)
Chain 3:                0.082 seconds (Sampling)
Chain 3:                0.13 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 5e-06 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.05 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 3000 [  0%]  (Warmup)
Chain 4: Iteration:  300 / 3000 [ 10%]  (Warmup)
Chain 4: Iteration:  600 / 3000 [ 20%]  (Warmup)
Chain 4: Iteration:  900 / 3000 [ 30%]  (Warmup)
Chain 4: Iteration: 1001 / 3000 [ 33%]  (Sampling)
Chain 4: Iteration: 1300 / 3000 [ 43%]  (Sampling)
Chain 4: Iteration: 1600 / 3000 [ 53%]  (Sampling)
Chain 4: Iteration: 1900 / 3000 [ 63%]  (Sampling)
Chain 4: Iteration: 2200 / 3000 [ 73%]  (Sampling)
Chain 4: Iteration: 2500 / 3000 [ 83%]  (Sampling)
Chain 4: Iteration: 2800 / 3000 [ 93%]  (Sampling)
Chain 4: Iteration: 3000 / 3000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.048 seconds (Warm-up)
Chain 4:                0.095 seconds (Sampling)
Chain 4:                0.143 seconds (Total)
Chain 4:

Get results

rstan::traceplot(modelstan)

Get results

print(modelstan, dig = 2)

Inference for Stan model: anon_model.
4 chains, each with iter=3000; warmup=1000; thin=1; 
post-warmup draws per chain=2000, total post-warmup draws=8000.

         mean se_mean   sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
alpha   38.03    0.04 1.83  34.44  36.84  38.02  39.22  41.60  2555    1
beta    -0.63    0.00 0.19  -1.01  -0.76  -0.63  -0.51  -0.25  2610    1
sigma    3.45    0.01 0.71   2.38   2.95   3.36   3.83   5.16  3090    1
mu[1]   37.40    0.03 1.67  34.14  36.31  37.40  38.47  40.69  2607    1
mu[2]   36.77    0.03 1.51  33.79  35.77  36.76  37.74  39.75  2682    1
mu[3]   36.13    0.03 1.36  33.44  35.24  36.13  37.01  38.84  2804    1
mu[4]   35.50    0.02 1.22  33.11  34.71  35.50  36.28  37.97  3007    1
mu[5]   34.87    0.02 1.10  32.72  34.15  34.86  35.57  37.09  3358    1
mu[6]   34.23    0.02 1.00  32.29  33.58  34.22  34.86  36.27  4002    1
mu[7]   33.60    0.01 0.92  31.75  33.00  33.59  34.18  35.50  5203    1
mu[8]   32.96    0.01 0.89  31.22  32.40  32.95  33.52  34.76  7329    1
mu[9]   32.33    0.01 0.89  30.58  31.77  32.32  32.88  34.11  9811    1
mu[10]  31.70    0.01 0.93  29.87  31.11  31.70  32.28  33.55 10078    1
mu[11]  31.06    0.01 1.00  29.09  30.42  31.07  31.70  33.07  9278    1
mu[12]  30.43    0.01 1.10  28.22  29.73  30.43  31.13  32.63  7141    1
mu[13]  29.80    0.02 1.23  27.34  29.01  29.80  30.58  32.25  5921    1
mu[14]  29.16    0.02 1.37  26.46  28.29  29.17  30.04  31.90  5108    1
mu[15]  28.53    0.02 1.52  25.52  27.55  28.53  29.51  31.55  4565    1
mu[16]  27.90    0.03 1.67  24.58  26.81  27.90  28.97  31.23  4190    1
lp__   -26.06    0.03 1.33 -29.45 -26.67 -25.72 -25.09 -24.55  2218    1

Samples were drawn using NUTS(diag_e) at Fri Nov 14 09:41:32 2025.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Get results

stan_est <- summary(modelstan)$summary[1:3,1]

comp <- cbind(truth = truth, lm = lm_est, Stan = stan_est)
print(comp, 4)

      truth      lm    Stan
alpha  40.0 38.0633 38.0321
beta   -0.7 -0.6367 -0.6334
sigma   4.0  3.1666  3.4536

Challenge

Repeat exercise, but with males and females
https://rpubs.com/amolina/bayesexample
Also you could try, run one of the models (e.g., GLM) run in class in Bayesian